Task-Based Algorithm for Matrix Multiplication: A Step Towards Block-Sparse Tensor Computing

نویسندگان

Justus A. Calvin

Edward F. Valeev

چکیده

Distributed-memory matrix multiplication (MM) is a key element of algorithms in many domains (machine learning, quantum physics). Conventional algorithms for dense MM rely on regular/uniform data decomposition to ensure load balance. These traits conflict with the irregular structure (block-sparse or rank-sparse within blocks) that is increasingly relevant for fast methods in quantum physics. To deal with such irregular data we present a new MM algorithm based on Scalable Universal Matrix Multiplication Algorithm (SUMMA). The novel features are: (1) multipleissue scheduling of SUMMA iterations, and (2) fine-grained task-based formulation. The latter eliminates the need for explicit internodal synchronization; with multiple-iteration scheduling this allows load imbalance due to nonuniform matrix structure. For square MM with uniform and nonuniform block sizes (the latter simulates matrices with general irregular structure) we found excellent performance in weak and strong-scaling regimes, on commodity and high-end hardware.

متن کامل

منابع مشابه

Two-dimensional cache-oblivious sparse matrix-vector multiplication

In earlier work, we presented a one-dimensional cache-oblivious sparse matrix–vector (SpMV) multiplication scheme which has its roots in one-dimensional sparse matrix partitioning. Partitioning is often used in distributed-memory parallel computing for the SpMV multiplication, an important kernel in many applications. A logical extension is to move towards using a two-dimensional partitioning. ...

متن کامل

Time Integration of Tensor Trains

A robust and efficient time integrator for dynamical tensor approximation in the tensor train or matrix product state format is presented. The method is based on splitting the projector onto the tangent space of the tensor manifold. The algorithm can be used for updating time-dependent tensors in the given data-sparse tensor train / matrix product state format and for computing an approximate s...

متن کامل

Fast sparse matrix multiplication on GPU

Sparse matrix multiplication is an important algorithm in a wide variety of problems, including graph algorithms, simulations and linear solving to name a few. Yet, there are but a few works related to acceleration of sparse matrix multiplication on a GPU. We present a fast, novel algorithm for sparse matrix multiplication, outperforming the previous algorithm on GPU up to 3× and CPU up to 30×....

متن کامل

An Efficient Fill Estimation Algorithm for Sparse Matrices and Tensors in Blocked Formats

Tensors, which are the linear-algebraic extensions of matrices in arbitrary dimensions, have numerous applications to data processing tasks in computer science and computational science. Many tensors used in diverse application domains are sparse, typically containing more than 90% zero entries. Efficient computation with sparse tensors hinges on algorithms that can leverage the sparsity to do ...

متن کامل

Fast Structured Matrix Computations: Tensor Rank and Cohn-Umans Method

We discuss a generalization of the Cohn–Umans method, a potent technique developed for studying the bilinear complexity of matrix multiplication by embedding matrices into an appropriate group algebra. We investigate how the Cohn–Umans method may be used for bilinear operations other than matrix multiplication, with algebras other than group algebras, and we relate it to Strassen’s tensor rank ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل

عنوان ژورنال:

CoRR

دوره abs/1504.05046 شماره

صفحات -

تاریخ انتشار 2015

Task-Based Algorithm for Matrix Multiplication: A Step Towards Block-Sparse Tensor Computing

نویسندگان

چکیده

منابع مشابه

Two-dimensional cache-oblivious sparse matrix-vector multiplication

Time Integration of Tensor Trains

Fast sparse matrix multiplication on GPU

An Efficient Fill Estimation Algorithm for Sparse Matrices and Tensors in Blocked Formats

Fast Structured Matrix Computations: Tensor Rank and Cohn-Umans Method

عنوان ژورنال:

اشتراک گذاری